1 Data Preprocessing: chile_data1

First 5 dimensions:

01 CULTURAL HERITAGE AND EVENTS

02 NATURAL RESOURCES AND SUSTAINABILITY

04 TOURISM INFRASTRUCTURE

05 TOURISM MOBILITY AND TRANSPORTATION INFRASTRUCTURE

Now after replacement and transformation, we now have NaNs in 4 columns:

Ideally, we need to request this data from the client. Since it is a small dataset, these numbers are important.

After taking a look at the data, I found out that the regions that have NaNs in '% OF LAND THAT CORRESPONDS TO FORESTS' are very similar to those have NaNs in 'LAND AFFECTED BY WILDFIRES'. Thus, I make the assumption that these regions have neglectable amount of land corresponding to forests, so they also don't have land affected by wildfires. I would replace the NaNs in these two columns with 0.

For the other two columns, I will impute these variables and replace NaNs with mean value of the column.

2 Data Preprocessing: chile_data2

Last 5 dimensions:

07 SECURITY AND SAFETY

08 ECONOMIC PERFORMANCE

09 TOURISM PROMOTION

10 GOVERNMENTAL INVOLVEMENT AND EFFICIENCY

Now after replacement and transformation, we now have NaNs in 8 columns:

Again, two strategies will be applied to clean these missing values.

Firstly, I will replace NaNs in 'Ski resorts', 'Major shopping centers', and 'Number of vineyards' with 0. Since these columns represent higher-end amenities, and there are no 0s in these columns, I assume that the NaNs here mean the regions do not have these amenities.

Secondly, I will replace the rest with the mean value of their respective columns, because I think these columns represent indexes that should be applicable to all regions. I assume NaNs here are due to error during data collection, so the mean value replacement approach is more appropriate.

3 Data Preprocessing: combine chile_data1 and chile_data2

4 Principal Component Analysis (PCA)

Eigenvalues and eigenvectors

According to the above result, we have 15 principal components here.

The numbers in the above result indicates that, the total sum of the explained variance increases with each component, but at the same time the majority of variance explained by several first components.
For instance, you can see that the principle component 1 explains about 33% of variance, while principle components 6 and 7 explain about 5% and 4% respectively.

I want to select the components that capture most of the variance.

Thus, I will use only first 6 components, which explain 76.3% of variance.

5 Develop scoring system for 10 dimensions

5.1 - Calculate a weighted average for each variable in principal components.

Multiply the percentage value of the explained variance by the percentage value of a feature in the selected principal component. As a result, a weighted average will be a new column in the dataframe with principal components.

5.2 - Calculate a score for each dimension.

Multiply weighted average of a variable by each standartized value in each column and sum up results, receiving a final score.

Ranking for dimension 01 CULTURAL HERITAGE AND EVENTS

Ranking for dimension 02 NATURAL RESOURCES AND SUSTAINABILITY

Ranking for dimension 04 TOURISM INFRASTRUCTURE

Ranking for dimension 05 TOURISM MOBILITY AND TRANSPORTATION INFRASTRUCTURE

Ranking for dimension 07 SECURITY AND SAFETY

Ranking for dimension 08 ECONOMIC PERFORMANCE

Ranking for dimension 09 TOURISM PROMOTION

Ranking for dimension 10 GOVERNMENTAL INVOLVEMENT AND EFFICIENCY

5.3 - Create an aggregated dataframe with all scores

6 Conclusion Outline

Extra research done:

Suggestion Overview

Focus on 2 dimensions:

Collect data for additional 2 dimensions:

Post COVID strategies:

7 Table manipulation for Taleau visualization

8 Appendix - Relevant Visualization & Slides

Menti%201.png

Menti%202.png

Tableau%201.png

Tableau%203.png

Tableau%205.png

Tableau%202.png

Tableau%204.png

Screen%20Shot%202020-11-24%20at%209.37.20%20AM.png

Screen%20Shot%202020-11-24%20at%209.38.09%20AM-2.png

Screen%20Shot%202020-11-24%20at%209.38.14%20AM.png

Screen%20Shot%202020-11-24%20at%209.38.20%20AM-2.png

Screen%20Shot%202020-11-24%20at%209.38.25%20AM.png

Screen%20Shot%202020-11-24%20at%209.38.28%20AM-2.png

Screen%20Shot%202020-11-24%20at%209.38.33%20AM.png

Screen%20Shot%202020-11-24%20at%209.38.37%20AM-2.png

Screen%20Shot%202020-11-24%20at%209.38.41%20AM.png

Screen%20Shot%202020-11-24%20at%209.38.46%20AM-2.png

Screen%20Shot%202020-11-24%20at%209.38.52%20AM.png

Screen%20Shot%202020-11-24%20at%209.39.02%20AM.png

Screen%20Shot%202020-11-24%20at%209.57.52%20AM.png